Skip to content

[docs] kernels#13139

Merged
stevhliu merged 2 commits intohuggingface:mainfrom
stevhliu:kernels
Mar 25, 2026
Merged

[docs] kernels#13139
stevhliu merged 2 commits intohuggingface:mainfrom
stevhliu:kernels

Conversation

@stevhliu
Copy link
Copy Markdown
Member

adds a kernels section in the Accelerate inference docs with the results:

  • cross-linked to Attention backends docs which demonstrates support for loading attention kernels with set_attention_backend
  • defer to the blog post and pipeline integration guide for more details about implementing non-attention kernels since this is more involved and already well-documented there

@stevhliu stevhliu requested a review from sayakpaul February 13, 2026 17:07
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for prioritizing it.


[Kernels](https://huggingface.co/docs/kernels/index) is a library for building, distributing, and loading optimized compute kernels on the [Hub](https://huggingface.co/kernels-community). It supports [attention](./attention_backends#set_attention_backend) kernels and custom CUDA kernels for operations like RMSNorm.

The [Diffusers Pipeline Integration](https://github.com/huggingface/kernels/blob/main/skills/cuda-kernels/references/diffusers-integration.md) guide shows how to integrate a kernel. Create a custom optimized attention processor, patch all modules in the model, and inject the kernel into the pipeline.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kernel skill basically lets users get an agent to write custom kernels for a model and hardware. It's not specific to the attention processor but also other modules as well such RMSNorm. Should we make it clearer?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lmk if this is clearer!

> [!TIP]
> Install the [add cuda-kernels](https://github.com/huggingface/kernels/blob/main/skills/cuda-kernels/SKILL.md) skill to teach Claude or Codex how to write a kernel. The [Custom kernels for all from Codex and Claude](https://huggingface.co/blog/custom-cuda-kernels-agent-skills) blog post covers this in more detail.

For example, a custom RMSNorm kernel with [torch.compile](#torchcompile) speeds up LTX-Video generation 1.43x on an H100.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't just RMSNorm but also other modules implemented with custom kernels.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mention RMSNorm as an example only for the benchmark results below

Copy link
Copy Markdown
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I would like to also see what @burtenshaw thinks about this.

@stevhliu stevhliu merged commit cbf4d9a into huggingface:main Mar 25, 2026
2 checks passed
@stevhliu stevhliu deleted the kernels branch March 25, 2026 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants